Identification and functional validation of super-enhancers in Arabidopsis thaliana

Significance Super-enhancers (SEs) are exceptionally large enhancers and play prominent roles in cell type identity and function in mammalian species. We identified a set of 749 putative SEs in the model plant Arabidopsis thaliana. We demonstrate that the SEs share the functional characteristics of mammalian SEs.We developed several small deletions within an SE located in the middle of the thalianol biosynthetic gene cluster. The deletion lines show distinct phenotypic changes and transcriptional repression of all five thalianol cluster genes. Our results suggest that SEs play important roles in regulating genes associated with development and tissue identity and in coordinating the coexpression of gene clusters in A. thaliana.

We were intrigued by the possibility of SE-mediated transcriptional regulation in plants. We surveyed the genomic regions containing large clusters of accessible chromatin regions (ACRs) marked by DNase I hypersensitive sites (DHSs) in A. thaliana. We identified a set of 749 putative SEs that represent the top 2.5% of the largest ACR clusters without overlapping with predicted promoters. These putative SEs and their predicted cognate genes mirrored the functional characteristics of SEs and cognate genes reported in mammalian species. We provide several lines of evidence to indicate that these SEs and their cognate genes likely play a major role in organ development and tissue identity in A. thaliana. We developed CRISPR/Cas-mediated deletion lines of a SE associated with the thalianol biosynthetic gene cluster (BGC) (20,21). Remarkably, small deletions (131-157 bp) within the SE resulted in both phenotypic changes and transcriptional repression of all five thalianol cluster genes, providing evidence for a role for this SE in regulating the operon-like expression pattern of the five thalianol cluster genes.

Results
Putative SEs in A. thaliana. One of the key features of mammalian SEs is that they span a cluster of constituent enhancers and a large genomic region (16,17). To search for potential SEs in A. thaliana, we used the published DNase-seq datasets generated from 17 diverse tissues/ecotypes and developed a comprehensive accessible chromatin map (SI Appendix, Table S1). We identified a total of 84,925 ACRs using F-seq (22)

(see Materials and Methods).
Approximately 37% of these ACRs overlapped with 500-bp regions upstream of a transcription start site (TSS), and so are named promoter ACRs. The remaining ACRs are defined as nonpromoter ACRs. The nonpromoter ACRs span an average of 395 bp, shorter than the average 594 bp for promoter ACRs. A subset of nonpromoter ACRs were found to form large clusters spanning a big genomic region (SI Appendix, Fig. S1A). For example, a 6.8kb region on chromosome 4 contained a total of 13 independent ACRs derived from different DNase-seq data sets (Fig. 1A). The genomic regions flanking this ACR cluster include two wellstudied genes related to floral and leaf development, APETALA2 (AP2) (AT4G36920) and SPATULA (SPT) (AT4G36930) (23,24). We arbitrarily defined the nonpromoter ACR clusters that were larger than 1.5 kb (the top 2.5%) as SEs. A total of 749 SEs were identified using this criterion (see Materials and Methods) (Dataset S1). These SEs have an average size of 2.1 kb, ranging from 1.5 to 10.6 kb (Fig. 1B), and contain an average of five ACRs, which are also named the "constituent ACRs" of SEs (Dataset S1). "Clusters of enhancers", or clusters of ACRs, were reported previously in A. thaliana (12). A set of 94 genes in A. thaliana were found to be associated with a cluster of enhancers (12).

Genomic Positions and Features Associated with SEs in A.
thaliana. Among the 749 SEs, only 99 (13%) are located within genes. The majority (75) of these genic SEs span multiple introns and/or exons. However, 22 SEs are located within a single intron (SI Appendix, Fig. S1B). These SE-containing introns have an average size of 1.9 kb, which is significantly larger than the average intron size (165 bp) in A. thaliana. Interestingly, two SEs were located within an exon of genes AT5G17980 and AT2G27660, respectively (SI Appendix, Fig. S1C). Both AT5G17980 and AT2G27660 are single-exon genes.
The remaining 650 SEs are located in intergenic regions. The majority (95%) of these intergenic SEs are within 3.2 kb of a TSS or a transcription termination site (TTS) of their closest gene (Fig.  1C). A few SEs are distant from any flanking genes. For example, a SE was found to be 4.8 kb away from the TSS of gene GPAT6 (AT2G38110), which is important for tapetum development and fertility (26) (SI Appendix, Fig. S1D). Approximately half of the intergenic SEs are located to the 5′ end of at least one of the two flanking genes. However, 35 SEs are located to the 3′ ends of genes on both sides.
We next compared the DNase I sensitivity levels between the "constituent ACRs" within SEs and the rest of the nonpromoter ACRs. The constituent ACRs showed a higher level of DNase I sensitivity in all DNase-seq samples analyzed (Fig. 1D), which is similar to the high level of chromatin accessibility associated with SEs in mammalian species (16,17). In addition, the SE-cognate genes (genes that are the closest to an SE, see Materials and Methods) showed a higher level of expression in most tissues compared with genes associated with other nonpromoter ACRs ( Fig. 1E and SI Appendix, Table S2), which is consistent with the higher level of DNase I sensitivity associated with constituent ACRs within SEs.
Chromatin Organization of SEs. Interphase chromosomes form kilobase-to megabase-sized topologically associating domains (TADs). These TADs generate isolated gene regulatory domains in which enhancers may regulate multiple genes but their effects on genes in neighboring TADs are blocked (27). Although TADs are not a dominant chromatin feature in the A. thaliana genome, TAD-like structures can be identified in high-resolution 3D chromatin studies (21,28,29). Approximately 1,000 TAD-like regions were identified at 2-kb resolution (28).
We explored the association between these TADs and SEs. We first analyzed three well-characterized TADs that contain coregulated metabolic gene clusters (21). At least one SE was identified in each of these three TADs (SI Appendix, Fig. S3). For example, the thalianol gene cluster is located within a TAD spanning 19.4-19.5 megabases on chromosome 5 ( Fig. 2A) and includes five genes: thalianol acyltransferase 2 (THAA2), thalianol acyltransferase 1 (THAA1), thalianol oxidase (THAO), thalianol hydroxylase (THAH) and thalianol synthase (THAS) (21,30). An SE was identified in between the 3′ end of THAO and the 3′ end of THAH. This SE spans 3.6 kb and exhibits the highest level of chromatin accessibility in root tissue, which is in accordance with the fact that the thalianol genes are highly expressed in roots (31). Analysis of published Hi-C datasets (21) revealed multiple interactions between this SE and the thalianol genes in root tissue ( Fig. 2A).
To test whether SEs are enriched within TADs, we calculated the percentages of SEs, promoter ACRs, and other nonpromoter ACRs that are located within TADs. We generated 1,000 random nongenic regions with the same sizes as the three types of ACRs to obtain the empirical distribution of percentage of regions located within TADs. We found that 49.7% of SEs are located within TADs (Dataset S1), significantly higher than for random nongenic regions (P = 0.003, empirical test). In contrast, promoter ACRs and other nonpromoter ACRs were not enriched in TADs (Fig. 2B). Enrichment of SEs in TAD-like chromatin domains in A. thaliana supports the hypothesis that an isolated chromatin domain covering an SE and the SE-cognate genes may facilitate precise regulation and block interference with/from neighboring genes and CREs (18).
SEs and their cognate genes tend to be located in gene desert regions in mammalian species (18). For example the Sox2 gene, which is essential for pluripotency in mammals, and its SE are located within a 1.5-megabase gene desert (32). Location within low-gene density regions of the genome may facilitate the insulation of the SEs and their cognate genes. To test whether the SEs in A. thaliana are also located in regions with a relatively low gene density, we calculated the gene density of the A. thaliana genome using 15-kb sliding windows with 5-kb steps. Since the large sizes of SEs may impact the calculation of gene density, the lengths of SEs were subtracted in each window during gene density calculation (see Materials and Methods). We found that genomic regions containing SEs exhibited a significantly lower gene density than promoter ACRs and other nonpromoter ACRs (P < 2.2 × 10 −16 , one-sided Wilcoxon-Mann-Whitney test) (Fig. 2C).

Contribution of SEs and Their Cognate Genes to Tissue Identity
in A. thaliana. In mammalian species, SE-cognate genes identified in a specific cell type are highly enriched for the biological processes that define that cell type (3,19). We, therefore, next explored the biological function of the SEs and their cognate genes in A. thaliana. We first examined whether the SEs are bound to specific TFs using published chromatin immunoprecipitation (ChIP)-seq data for 52 TFs in A. thaliana (33). We calculated the numbers of TFs bound to the constituent ACRs of SEs, promoter ACRs, and other nonpromoter ACRs. We found that the constituent ACRs were bound by the highest number (density) of TFs (Fig. 3A), and this was not due to the size of SEs or their distances from genes (SI Appendix, Fig. S4). We next asked if the binding sites of any TFs are enriched within the SEs. Enrichment was analyzed by calculating the log-odds ratio of each TF in pairwise comparisons among the three different classes of ACRs (see Materials and Methods). Several TFs were enriched within the constituent ACRs (Fig. 3B). Many of these TFs, such as APETALA1 (AP1) and JAGGED (JAG) (Fig. 3B), are key regulators of tissue development and organ identity. AP1 is involved in specifying the identity of floral meristem and regulates petal development (34). JAG plays an important role in controlling proper lateral organ shape of leaves, sepals, petals, and stamens (35).
We also analyzed the enrichment of the TF-binding motifs derived from DNA affinity purification sequencing (DAP-seq) in A. thaliana (36). Among the 529 TFs analyzed by DAP-seq, 47 were enriched within the constituent ACRs. The Homeobox, zinc finger homeodomain (ZFHD) and squamosa promoter binding protein (SBP) TF families were especially overrepresented (Fig. 3C). The Homeobox and ZFHD TF families are involved in meristem differentiation and organ identity (37,38). The SBP TFs, including squamosa-promoter binding protein-like (SPL) 3, participate in floral development (39,40).
We next used gene ontology (GO) enrichment to analyze the functions of the SE-cognate genes in A. thaliana. It has long been a challenge to accurately identify genes cognate to each enhancer. A recent study in mammalian species showed that distance-based methods perform adequately compared with the correlation-based approaches and machine-learning methods that rely on genomic and epigenomic datasets (41). Thus, we simply assigned the closest gene to each SE as the putative cognate gene. A total of 722 genes were assigned to the 749 SEs. Compared with promoter ACRs, SE-cognate genes were enriched with "biological processes", including "regulation of molecular function" and "multicellular organism development" (SI Appendix, Table S3). SE-cognate genes were also enriched with the same GO terms when compared with genes associated with the other nonpromoter ACRs (SI Appendix, Table S3). Thus, the SE-cognate genes are distinctly enriched in functions related to development.
Collectively, these results showed that SEs as well as their cognate genes play a major role in organ development and tissue identity in A. thaliana, mirroring the functional characteristics of SEs and their cognate genes identified in mammalian species.

Evolutionary Conservation of the A. thaliana SEs in Brassicaceae
Species. We were interested in the degree of conservation or divergence of the SE-related DNA sequences during evolution. We chose five Brassicaceae species that have different phylogenetic distances to A. thaliana, for our evolutionary study (Fig. 4B). To map the orthologous sequences of the SEs in the five species, we first identified syntenic gene pairs between A. thaliana and the five selected species (see Materials and Methods). The sequence of each SE was then aligned to the sequence between the two genes of a syntenic gene pair in a target species (Fig. 4A). We identified 17,364 syntenic gene pairs in at least one of the five species. Of the 749 SEs identified in A. thaliana, 498 (66.3%) were found to be located within a syntenic gene pair in at least one of the five Brassicaceae species. We reasoned that if the SEs identified in A. thaliana have an impact on transcription of their neighboring genes, we would expect strong synteny conservation of the gene pairs spanning SEs. To test this hypothesis, we calculated the number of species that maintained synteny of gene pairs spanning SEs. We found that 70% of the 498 gene pairs maintained synteny in at least five of the six species analyzed, which is significantly higher that gene pairs without associated SEs (50%, P < 1.0 × 10 −3 , empirical test). We next asked how the SE sequences evolved during the evolution of the Brassicaceae species. To our surprise, most of the A. thaliana SEs are highly conserved among the Brassicaceae species. For example, 81% (312 of 384) of the A. thaliana SEs spanned by syntenic gene pairs showed homology to the orthologous sequences in Brassica rapa, which diverged from the common ancestor to A. thaliana approximately 43 Mya (43) (Fig. 4B). In contrast, only 48% of the promoters of the same syntenic genes showed homology to the promoters of the orthologous genes in B. rapa (SI Appendix, Fig. S5). The sequence identity (portion of matched nucleotides over aligned regions, see Materials and Methods) of the SEs did not change linearly with the phylogenetic distance (Fig. 4C). The sequence identity of SEs clearly dropped after about 16 My of divergence, reduced from 71% in Arabidopsis lyrata to 48% in Capsella rubella (Fig. 4C). However, the sequence identity of SEs remained at 46% and 45% in Arabis alpina and B. rapa, although these two species split from the common ancestor of A. thaliana about 43 Mya. The maintenance of sequence identity among distantly related species indicates that the ancestral DNA associated with the SEs may contain core regulatory sequences that are biologically significant and, therefore, have been maintained during evolution. To test this hypothesis, we identified all TF-binding motifs in the SEs of A. thaliana and calculated the motif retention rate based on the percentage of these motifs that also exists in the orthologous sequences in other Brassicaceae species (see Materials and Methods). The motif retention rate was more stable across the species compared with sequence identity (Fig. 4C). Indeed, 73% (363 of 498) of SEs were found to contain conserved noncoding sequences (CNSs) (44).
For example, an SE containing four constituent ACRs was found upstream of gene ILI1 BINDING BHLH 1 (IBH1) (Fig. 4D) encoding a basic helix-loop-helix (bHLH) TF that regulates cell elongation via the brassinosteroid pathway (42, 45 ,46). Four conserved sequence blocks (A, B, C, and D) were identified within the SE by multiple sequence alignment (Fig. 4D). Several conserved TF-binding motifs were identified within each conserved sequence block, including BRASSINAZOLE-RESISTANT 1 (BZR1), CYCLING DOF FACTOR 5 (CDF5), ETHYLENE RESPONSE FACTOR 11 (ERF11) and DNA BINDING WITH ONE FINGER 5.5 (DOF5.5). Of These TFs, BZR1, whose motif located within block A, is a known regulator of IBH1 (42). Of note, only the sequence conservation of SEs within Brassicaceae was analyzed in this study. The function of the orthologous sequences -for example, chromatin accessibility and preservation of enhancer functionality -of SEs in other species is unknown. Thus further studies are required to decipher the functional conservation of SEs within Brassicaceae species.

Genomic Disruption of a SE Associated with the Thalianol
Gene Cluster. We selected the SE associated with the thalianol gene cluster for functional validation. Analysis of Hi-C datasets suggested that this SE interacts with multiple genes in root tissue ( Fig. 2A). Thus, we predicted that modification of this SE would alter the pattern of the root-specific expression of multiple genes within the cluster. This SE (3,578 bp) contains putative binding sites for 21 different TFs that are known to be associated with root development (SI Appendix, Table S4). We attempted to delete a 278-bp region that showed the highest level of DNase I accessibility (in root tissue) within the SE (Fig. 5A). This region contains predicted binding sites for five TFs related to root development, including WIP DOMAIN PROTEIN 5 (WIP5), HOMEODOMAIN 13 (HB13), JACKDAW (JKD), HOMEOBOX ARABIDOPSIS THALIANA 2 (HAT2) and ENHANCED DROUGHT TOLERANCE 1 (EDT1) (47-52) (Fig. 5A). Most interestingly, HB13 and EDT1 are considered as negative regulators of root length (50,51). Three guide RNAs (gRNAs) were designed for the development of two short deletions that span the HB13 and EDT1 binding sites, respectively (Fig. 5A).
We obtained two deletion lines (#9 and #46) that had lost a 158-bp region spanning the HB13 binding site, and a further two deletion lines (#192 and #202) lacking a 131-bp region spanning the EDT1 binding site (Fig. 5A). Interestingly, the deletion lines had longer primary roots compared with wild-type plants during early development (Fig. 5 B and C), a phenotype similar to those reported for mutations of the thalianol cluster genes (20) and also mimicking the phenotypes of hb13 and edt1 (50, 51). The phenotypes of the elongated primary roots are also correlated with losses of the binding sites of HB13 and EDT1, the two negative regulators for root length.
THAO and THAA1, the two genes upstream of the SE, became down-regulated in all deletion lines. In contrast, THAH and THAS, the two genes downstream of the SE, were down-regulated only in #192, which is closer to these two genes than deletion #46 (Fig. 5D). THAA2, which is the most distant gene from the SE, was down-regulated in #46 but not in #192. We also examined the expression of two intervening nonthalianol genes, AT5G47960 and AT5G47970, within the cluster (Fig. 5A). Interestingly, the transcription of both genes was significantly reduced in #46 (SI Appendix, Fig. S6). In contrast, gene AT5G45390, which is located outside of the TAD that spans the thalianol gene cluster, was not changed in the deletion lines (SI Appendix, Fig. S6). Sequence analysis revealed two small duplications in the SE region. A 295-bp region that spans the two CRISPR/Cas deletions shares 82% sequence similarity with a 279-bp downstream region within the SE (Fig. 6). Interestingly, a 560-bp region shares 86% sequence similarity with a 536-bp region located in a 1,079-bp ACR upstream of the SE (Fig. 6). This upstream ACR was separated from the SE by a 1,101-bp DNA transposon (VANDAL 18NA). This transposon is universally present in >1,000 sequenced A. thaliana accessions (53). Therefore, this ACR is likely part of the SE but was excluded computationally from our default SE identification system. The two small duplications would appear to be a single ~850-bp tandem duplication before the insertion of the transposon (Fig. 6). This duplication could have been induced by the transposition event. Several A. thaliana accessions have been sequenced with reference genome quality (54). Analysis of the intergenic sequences between genes THAO and THAH in these accessions revealed additional insertions of DNA transposons (MuDR and VANDAL) in this region, including an additional VANDAL copy in the middle of the SE in the An-1 ecotype (Fig. 6).
We identified two transfer DNA (T-DNA) lines inserted in the SE region. The T-DNA of SALK_149827 inserted around 194,36,952 bp on chromosome 5 (the exact position is not determined). This T-DNA is approximately 203 bp away from the TTS of gene THAO (Fig. 7A). The T-DNA of SALK_093786 inserted at 194,38,395 bp (exact position is determined). This T-DNA is located in the middle of the 1,079-bp ACR and is 1,646 bp away from the TTS of THAO (Fig. 7A). In both lines, transcription levels of the THAO gene were drastically reduced. Furthermore, significant transcriptional repression was observed for the THAA1 and THAH genes in line SALK_149827 (Fig. 7B) and the THAA1 and THAS genes in line SALK_093786. Remarkably, we also detected a significant upregulation for the genes THAA2 and THAH in line SALK_093786 (Fig. 7C). The mis-regulation of the thalianol cluster genes in both lines with disrupted SE structure further supports the important role of the SE in cluster transcription. In contrast, insertion of T-DNA into individual thalianol cluster genes resulted in repression of the targeted gene only and had no impact to the rest of the genes in the cluster (20).

Discussion
The "SE" concept was developed in mammalian species more than 10 y ago (3,16,17). The functional relevance of SEs, however, is still under discussion. There is some controversy about whether SEs have fundamentally different functions compared with regular enhancers (55,56). Nevertheless, in addition to their large sizes, SEs in mammalian species have several features that distinguish them from regular enhancers, including their association with high levels of transcriptional coactivators, histone mark H3K27ac, and chromatin accessibility, and their recruitment of high concentration of TFs (17,57). Most importantly, SEs have been recognized to regulate genes that have prominent roles in cell type identity and function in mammalian species (58). We characterized a set of the 749 largest ACR clusters in the A. thaliana genome. These SEs are preferentially associated with TAD-like chromatin domains. Our analyses also reveal that both the SEs and their cognate genes play roles in organ development and tissue identity in A. thaliana. Thus, the A. thaliana SEs mirror the functional characteristics associated with mammalian SEs.
Although the individual constituent enhancers within SEs have often been presumed to act synergistically (59), deletion mapping of several SEs in mammalian species suggest that individual enhancers within the clusters are usually functionally redundant (60) and can act in an additive manner (61,62). Analysis of the SE associated with the thalianol gene cluster revealed a similar functional redundancy of the constituent enhancers. The duplicated regions within the SE showed more than 80% sequence homology and shared putative binding sites for TFs related to root development, suggesting that the duplicons have similar functions. However, in-depth deletion mapping will be required to confirm the functional redundancy of these duplicated sequences.  Rapid evolution of enhancers is a universal feature of mammalian genomes. Sequence analysis across 20 mammalian species showed that enhancers have evolved appreciably more rapid than promoters (63). However, SEs associated with developmentally essential genes can be highly conserved among distantly related mammalian species (59,64). Nearly 42% of zebrafish SEs are located in close proximity to orthologs that are also associated with SEs in mouse and human despite the low overall SE sequence conservation (65). Therefore, the conservation of SEs, in contrast to the rapid evolution of normal enhancers, may be related to the conservation of biologically important CREs within the SEs (66,67) and to the evolutionary stability of their cognate genes (65,68). We demonstrate that the A. thaliana SEs show high-level conservation among the Brassicaceae species. These SEs are even more conserved than the corresponding promoters based on analysis of the syntenic gene pairs in different species (SI Appendix, Fig. S5). Thus, the A. thaliana SEs show a similar evolutionary pattern to those reported in animal species.
The raison d'etre for the existence of BGCs in plant genomes is intriguing. These clusters have features in common with bacterial operons (physical clustering and co-expression) but, unlike operons, each gene has its own promoter (69,70). Analysis of chromosome conformation capture data has revealed that active BGCs are often confined within a TAD identified in the tissue where the cluster is expressed (21). By contrast, inactive BGCs are associated with heterochromatin in tissues where the cluster is not expressed (21). Coregulation is therefore likely achieved epigenetically within a large chromatin domain spanning an entire BGC. In the current study, we demonstrate that SEs may play a central role in the coregulation of these operon-like gene clusters. Deletions of a small portion of the SE can cause significant repression of all five genes of the thalianol cluster (Fig. 5D). The partial suppression of the genes observed in the deletion lines is likely due to the fact that the deletions (131-157 bp) account only a very small portion of the large SE as well as the presence of duplicated sequences related to the deleted DNA segment with the SE. It is worthy to note that additional intergenic ACRs are present in the thalianol gene cluster. Further dissection of these ACRs will be required to know if they play a role in the coregulation of the entire gene cluster, or in fine-tuning individual genes in the cluster.
Alteration of all five genes in the deletion lines suggests that the SE likely interacts with the promoters of all five genes (Fig. 5E). This hypothesis is supported by the fact that T-DNA insertions in the SE region can alter the transcription of all five thalianol cluster genes (Fig. 7). In contrast, T-DNA insertions into individual thalianol cluster genes only alter the expression of the target gene (20). The five thalianol genes have completely different promoter sequences. Interestingly, the SE and the five promoters share binding sites of many of the TFs related to root development (SI Appendix, Table S4). We have recently observed a similar phenomenon of shared TF binding sites between intronic enhancers and their cognate promoters in A. thaliana (15). Thus, the SE may interact with all five promoters via transcriptional co-activators as well as the common TFs (SI Appendix, Fig. 5E). We identified at least one in each of three other well-studied BGCs (SI Appendix, Fig. S3), suggesting that the SE-mediated system may be a common mechanism for the coregulation of biosynthetic gene clusters in plants.

Materials and Methods
Identification of ACRs and SEs. A total of 17 DNase-seq datasets were obtained from NCBI Short Read Achieve. DNase-seq reads were mapped to A. thaliana TAIR10 genome using BWA aln (71) with default parameters and then convert to BAM format by SAMtools (72). Sequence reads with more than 20 mapping quality were retained for further analysis. ACRs were identified from each sample using F-seq (22). To determine the false discovery rate (FDR) for each sample, the same number of random reads as DNase-seq was generated 10,000 times and were called ACRs using F-seq. ACRs with FDR less than 0.001 were kept for further analysis. A union set of ACRs were generated by merging ACRs from all samples. These merged ACRs were classified into promoter ACRs, intergenic ACRs, and constituent ACRs of SEs. Promoter ACRs were defined as ACRs that overlap with 500 bp upstream of a TSS. The remaining nonpromoter ACRs were clustered if they are less than 50 bp apart. The top 2.5% of longest nonpromoter ACR clusters were defined as SEs. The individual ACR within each SE were defined as constituent ACRs. DNase I sensitivity of ACRs was calculated as the number of reads mapped to ACRs normalized by the total number of mapped reads in each sample. The genomic positions of three types of ACRs were determined by comparing the genomic coordinates of the ACRs to TAIR 10 gene annotation using BEDTools (73). To calculate gene density, genome was divided in to 15-kb windows with a step of 5 kb. Gene density for each window was calculated below: Chromatin Interaction Associated with SEs. Hi-C data from leaves and roots (SRP224678) (21) were analyzed using FAN-C (74) with parameters "--restriction-enzyme BglII" for "fanc map" and "-l -us -m -p 1 -r BglII -q 20 -S" for "fanc pairs". The interaction frequencies were normalized by HiCKRy from FitHiC2 (75) with the Knight-Ruiz algorithm. Chromatin loops were identified using FitHiC2 with parameters "-r 2,000 -m 5 -U 10,00,000." Interactions with FDR less than 1 × 10 −10 were retained for further analysis. One-sided Wilcoxon-Mann-Whitney test was conducted in R.
TFs Occupancy and Binding Motif Analysis. Peaks of the binding of 52 TFs were obtained from PlantPan3 (33). TF binding associated with three types of ACRs was analyzed using BEDTools. An ACR is bound by a TF if more than 50% of the ACR was covered by TF binding peaks. The odds ratio of each TF that bound constituent ACRs of SEs comparing with nonconstituent ACRs was calculated below: where non-constituent ACRs were promoter ACRs or intergenic ACRs. TF-binding motifs within ACRs were identified using FIMO (25) with default parameters against Plant Cistrome Database (36). TF motif retention rate is calculated as the percentage of TF-binding motifs identified in A. thaliana that also exists in the orthologous sequences in other Brassicaceae species. GO enrichment was conducted using goenrich (https://github.com/jdrudolph/ goenrich). Conserved noncoding sequences (CNSs) were obtained from published report (44).

Expression of Genes associated with SEs and Other Nonpromoter ACRs.
A gene that is the closest to an SE or to a nonpromoter ACR is defined as the gene cognate to the SE or to the nonpromoter ACR. Gene expression data were obtained from a previous study (76) and the TraVA database (travadb.org) (77,78). The significance of difference of the expression levels between SE-cognate genes and nonpromoter ACR-cognate genes in each tissue was tested using one-sided Wilcoxon-Mann-Whitney test in R (4.0.4).

Sequence Analysis and Evolution of SEs.
To analyze the sequence variation of the SE associated with the thalianol gene cluster, we mapped the VANDAL 18NA insertions in the 1,001 genome sequence database (53). The genomic sequencing data of 1,135 ecotypes were downloaded from NCBI Short Read Archive (SRP056687). Sequencing data from each ecotype were aligned to 1) left joint point (
Syntenic gene between A. thaliana and other Brassica species (A. lyrata, Boechera stricta, C. rubella, A. alpina, and B. rapa) were identified using SynMap2 (82). Pairs of genes that upstream and downstream of a SE in A. thaliana were extracted. Syntenic gene pairs were defined as gene pairs that are located within synteny blocks in any one of the species were retained. To identify orthologous sequences of SEs in other Brassica species, we required that the syntenic gene pairs in target species are also adjacent pairs without insertion of any genes between them. SEs in A. thaliana were aligned to the sequences between the genes of syntenic gene pairs in target species using BLAT with parameter "-minIdentity = 70". Motifs within SE orthologous sequences were identified using FIMO (25) with default parameters against Plant Cistrome Database (36). Conserved noncoding sequences (CNSs) were obtained from published study (44). Sequence identity is calculated as the percentage of matched nucleotides over the aligned region.  Table S6). The three gRNA expression cassettes driven by AtU6-26 were tandemly inserted into CAMBIA1300-pYAO:Cas9 vector through SpeI/NheI digestion (84). Homozygous deletion lines were generated by transforming wild-type A. thaliana (Col-0) plants with A. tumefaciens GV3101 via floral dipping as previously described (15). All homozygous deletion lines were finally confirmed by PCR using primers flanking the region (F-5′CTTCGAGAAGATAATGATGTATGG3′; R-5′TAAAAATGCGGTAGGTGCTAAC3′).
To quantify the primary root length, seeds of the homozygous deletion liens and wild-type Col-0 were germinated on square Petri dishes containing solid MS and kept in darkness at 4°C. The plates were transferred to a growth chamber with 16-h d and 8-h night at 22°C and positioned vertically. The 4-d-old seedlings were imaged by cannon 6D Mark II digital camera (Cannon, Japan) and were measured for their primary root length using Image J (85). Total RNAs were extracted from roots harvested from 4-d-old seedlings using the RNeasy Plant Mini Kit (Qiagen, Cat. # 74904). cDNA was synthesized using PrimeScript™ RT reagent Kit with gDNA Eraser (Takara, Cat. # RR047A). Quantitative real-time PCR (qPCR) was performed with gene-specific primers (SI Appendix, Table S7) using a total volume of 20 μl of TB Green® Fast qPCR Mix (Takara, Cat. # RR430A) on a LightCycler 480 (Roche) system (Roche, Switzerland). AtUBC21 was used as a reference gene. Experiments were taken for three technical replicates in three biological replicates.